MuLAS: a framework for automatically building multi-tier corpora

نویسندگان

  • Sérgio Paulo
  • Luís C. Oliveira
چکیده

The Multi-Level Alignment System (MuLAS) is the L2F tool for building multi-tier speech corpora with reduced or no human intervention at all. MuLAS automatically combines information coming from external speech annotations, human or machine-generated, with the text-based utterance descriptions that it creates, in order to build more reliable and complete descriptions of the spoken utterances. This paper presents our methods for multi-tier annotation synchronization, which lie behind the MuLAS operation. Such methods have allowed us to expand the building of multi-tier corpora to new languages without spending too much effort. MuLAS has been successfully applied to the building of multitier corpora for speech synthesis in American and British English, European Portuguese and German. Natural prosody generation has benefited from MuLAS, too, since prosodic models can be derived from corpora built by MuLAS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Cache Coherence Protocol for Distributed Internet Services

Multi-tier architectures provide a means for building scalable distributed services. Caching is a classical technique for enhancing the performance of systems (e.g. database servers, or web servers). Although caching solutions have been successfully studied for individual tiers of multi-tier systems, if collectively applied, these solutions may violate the coherence of cached data. This paper p...

متن کامل

OMS Java: Lessons Learned from Building a Multi-Tier Object Management Framework

We present the object-oriented multi-tier application framework OMS Java which is independent of the underlying database management system (DBMS). We detail the storage management component and sketch which part of the framework has to be extended when introducing a new DBMS. We compare versions of OMS Java using the persistent storage engine ObjectStore PSE Pro for Java, the object-oriented DB...

متن کامل

A Framework for Institutions Governing Institutions

Norms guide multi-agent systems away from being potentially anarchic towards a coordinated and collaborative society. Institutions provide an explicit, external representation of norms as well as the means to detect violations and other conditions. Each institution can be crafted individually to capture their designers’ goals, but this creates a challenge at higher levels of authority in guidin...

متن کامل

CorpusReader: designing and querying multi-layer corpora

CorpusReader is a framework for creating and querying multi-layer corpora, which contain several levels of analysis (morphology, syntax, semantics, etc.) and which are aimed at observing correlations between these levels. Building, representing and querying multi-layer corpora is complex. CorpusReader’s specificity essentially lies in merging the outputs of existing corpus analysis tools, avoid...

متن کامل

An Efficient Framework for Extracting Parallel Sentences from Non-Parallel Corpora

Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation. However, comparable non-parallel corpora are richly available in the Internet environment, such as in Wikipedia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007